Add ability to enforce concurrent query limits by lbschanno · Pull Request #3257 · NationalSecurityAgency/datawave

lbschanno · 2025-10-31T18:14:28Z

Configuration

Add the ability to enforce concurrent query limits across a group of webservers. Zookeeper is used to track active queries and the following data:

The query ID
The user who submitted the query
The system the query was submitted on
The query logic the query originated from

Query limit enforcement is done through the QueryLimiter class. Given a user, system, and query logic, it can determine if any of the following limits have been exceeded:

The max allowed concurrent queries for the user.
The max allowed concurrent queries of the query logic for the user.
The max allowed concurrent queries for the system.
The max allowed concurrent queries of the query logic for the system.

Limits may be defined and customized on a per-user and per-system basis. They may also be defined for groups of query logics. The classes UserLimitProvider, SystemLimitProvider, and QueryLogicGroupLimitProvider are respectively responsible for identifying the best limits to enforce for a user, system, and query logic. They will be initialized in the QueryLimiter after providing a QueryLimitConfiguration instance. The following can be configured:

On a system-wide basis:

The default concurrent user query limit. This applies to the total number of queries a user may run across all systems. May be overridden per user.
The default concurrent system query limit. Primarily to avoid a system getting overloaded. May be overridden per system.

On a per-system basis:

The system name/ids the configuration targets. Regex matching is supported. Pattern uniqueness is enforced.
The concurrent system query limit. Overrides the system-wide value.
Whether queries submitted to the system count towards a user's concurrent query total. This is always true unless specified otherwise.
The concurrent system query limit for different query logic groups. Regex matching against group names is supported. Pattern uniqueness is enforced.

On a per-user basis:

The user DN.
The user's concurrent query limit. Overrides the system-wide configuration.
The user's concurrent query limit for different query logic groups. Regex matching against group names is supported.

On a per-query-logic-group basis:

The group name.
The query logics included in the group. Regex matching is supported. Pattern uniqueness is enforced.
The default concurrent user query limit. This applies to the total concurrent queries a user may run that originate from a query logic in the group across all systems.

Given the possibilities for exact matches, partial regex matches, and wildcard regex matches, the determination of the best limit to use for any particular system, query logic, or query logic group is done by sorting matches into the following 'matching buckets' (in best-match priority):

Exact match: We attempt to find an exact match first and use the associated limit.
Partial regex (non-wildcard-only): If we cannot find an exact match, then we attempt to find all partial matches, and see if any of their limits are met.
Wildcard-only regex: In the case of no exact or partial matches, we use the wildcard match with the lowest limit.

Implementation

Checking limits and marking as active/inactive is done through the QueryLimiter class. The three main methods to know are:

QueryLimiter.checkForLimits()
QueryLimiter.countQueryTowardsLimits()
QueryLimiter.stopCountingQueryTowardsLimits()

When a query is marked as active via QueryLimiter.countQueryTowardsLimits(), it will delegate to the ActiveQueryTracker, which will in turn create nodes in Zookeeper under the ActiveQueries namespace. When ActiveQueryTracker.trackQuery() is called, the following nodes are created:

# Container nodes
/users/<userDn>/<queryLogic> # Only for queries on systems that count towards user limit
/systems/<system>/<queryLogic>
/distinctQueryLogics/<queryLogic>  [Only created if it does not exist.]

# Ephemeral nodes. These will auto-delete themselves if their associated Zookeeper connection ever goes down.
/users/<userDn>/<queryLogic>/<queryId> # Only for queries on systems that count towards user limit
/systems/<system>/<queryLogic>/<queryId>

ActiveQueryTracker.trackQuery() will return a QueryHeartbeat that contain a list of PersistentNode (provided by the Apache Curator library) wrappers around the ephemeral nodes listed above. The QueryHeartbeat will maintain the connection to Zookeeper and attempt to keep the ephemeral nodes present in Zookeeper until QueryHeartbeat.stop() is called. If QueryHeartbeat.stop() is called, or the webserver crashes, the ephemeral nodes will automatically be deleted by Zookeeper.

The following error codes have been added:

412-20  - Concurrent query limit exceeded
500-164 - Error checking concurrent query limits

Closes #3100

Add the ability to enforce concurrent query limits across a group of webservers. Zookeeper is used to track active queries and the following data: - The query ID - The user who submitted the query - The system the query was submitted on - The query logic the query originated from When the `ActiveQueryTracker` is instructed to track a query, the following nodes will be created in Zookeeper under the 'ActiveQueries' namespace: ``` /users/<userDn>/<queryId> /systems/<systemName>/<queryId> /queryLogics/<queryLogic>/<queryId> /queries/<queryId> /queries/<queryId>/user [data = byte[] value of userDn] /queries/<queryId>/system [data = byte[] value of systemName] /queries/<queryId>/queryLogic [data = byte[] value of queryLogic] /queries/<queryId>/heartbeats ``` This is done through the use of the `ActiveQueryTracker` class. In addition to managing the nodes that record information about the query, the `ActiveQueryTracker` class is also responsible for providing instances of the `QueryHeartbeat` class. A `QueryHeartbeat` is a wrapper around an ephemeral PersistentNode, provided by the Apache Curator library. As long as this node is present in Zookeeper for a particular query, the query will be considered to be active. Should the webservers fail over and the Zookeeper connection drop, these heartbeat nodes will automatically be deleted by Zookeeper. The `ActiveQueryTracker` is also responsible for providing instances of the `ActiveQuerySnapshot` class, which represent a snapshot of total active queries at a point in time that are associated with a particular user, system, or query logic. Query limit enforcement is done through the `QueryLimiter` class. Given a user, system, and query logic, it can determine if any of the following limits have been exceeded: - The max allowed concurrent queries for the user. - The max allowed concurrent queries of the query logic for the user. - The max allowed concurrent queries for the system. - The max allowed concurrent queries of the query logic for the system. Limits may be defined and customized on a per-user and per-system basis. They may also be defined for groups of query logics. The classes `UserLimitProvider`, `SystemLimitProvider`, and `QueryLogicGroupLimitProvider` are respectively responsible for identifying the best limits to enforce for a user, system, and query logic. They will be initialized in the `QueryLimiter` after providing a `QueryLimitConfiguration` instance. The following can be configured: On a system-wide basis: - The default concurrent user query limit. This applies to the total number of queries a user may run across all systems. May be overridden per user. - The default concurrent system query limit. Primarily to avoid a system getting overloaded. May be overridden per system. - The default of whether queries submitted to a system are counted towards the user's concurrent query total. This is always true. On a per-system basis: - The system name/ids the configuration targets. Regex matching is supported. - The concurrent system query limit. Overrides the system-wide value. - Whether queries submitted to the system count towards a user's concurrent query total. Overrides the system-wide value. - The concurrent system query limit for different query logic groups. Regex matching against group names is supported. on a per-user basis: - The user DN. - The user's concurrent query limit. Overrides the system-wide configuration. - The user's concurrent query limit for different query logic groups. Regex matching against group names is supported. On a per-query-logic-group basis: - The group name. - The query logics included in the group. Regex matching is supported. - The default concurrent user query limit. This applies to the total concurrent queries a user may run that originate from a query logic in the group across all systems. Given the possibilities for exact matches, partial regex matches, and wildcard regex matches, the determination of the best limit to use for any particular system or query logic is done by sorting matches into the following 'matching buckets' (in best-match priority): 1. Exact match 2. Partial regex (non-wildcard-only) 3. Wildcard-only regex and then selecting the lowest limit from the best bucket where we first found a match. Currently the `QueryLimiter` is used in `QueryExecutorBean`, along with a `QueryHeartbeatCache` instance to cache heartbeats and keep them alive when a running query is cached for retrieval later. For the purposes of this feature, a query is considered to start when an Accumulo connection is retrieved from the connection factory, and is considered to end when the connection is returned to the factory. The following error codes have been added: 412-20 - Concurrent query limit exceeded 500-164 - Error checking concurrent query limits Closes #3100

…eryLimiter in QueryLimiterFactory.xml, remove scope attribute on queryLimiter and queryLimitConfiguration

...vices/cached-results/src/main/java/datawave/webservice/results/cached/CachedResultsBean.java

web-services/query/src/main/java/datawave/webservice/query/limit/ActiveQueryTracker.java

Co-authored-by: foster33 <84727868+foster33@users.noreply.github.com>

web-services/query/src/main/java/datawave/webservice/query/limit/ActiveQueryTracker.java

ivakegg

I like what I am seeing in here. I am thinking about potential error scenarios where things could get out of sync. Currently we have the following maps related to queries:

QueryCache: A map of query id to RunningQuery instances. Used by QueryExpirationBean to check for expired queries.
HeartbeatCache: A map of query id to heartbeat.

It might be worth adding a loop in the QueryExpirationBean that goes through the ids in the HeartbeatCache to verify that those queries are still running per the QueryCache. This would help to prevent anything getting out of sync if we have some unknown error cases that are not being accounted for. Other than that I am good to start integration testing this.

lbschanno · 2026-01-26T16:23:53Z

@ivakegg understood, I'll add in a loop to the QueryExpirationBean for synchronization with the heartbeat cache.

lbschanno · 2026-01-27T00:03:50Z

Added synchronization safeguard.

ivakegg · 2026-03-03T09:30:21Z

So I got this up and running and watching the zookeeper entries I realized that there has been one requirements which was not translated correctly and I apologize for not realizing this earlier. The "system" limit is supposed to be base on the "systemFrom" query parameter and not the hostname on which the query is running. I will take a shot at modifying the code accordingly.

lbschanno marked this pull request as draft October 31, 2025 18:15

lbschanno mentioned this pull request Oct 31, 2025

Limit the total number of queries a user can run concurrently in the system #3100

Open

lbschanno force-pushed the task/queryLimit branch from 156ed51 to 9c59e98 Compare October 31, 2025 18:33

lbschanno and others added 25 commits December 1, 2025 14:00

Tweak spring configurations

f91e18b

Merge branch 'integration' into task/queryLimit

5f3bf04

Fix issues with CDI

542ab09

Remove unnecessary annotations in QueryLimiter, use init-method on qu…

e26fc0b

…eryLimiter in QueryLimiterFactory.xml, remove scope attribute on queryLimiter and queryLimitConfiguration

Add logging handler for limit package

21fdb65

Simplify QueryHeartbeatCache and add logging

cd6d9d0

Use hostname for server info

47c020f

Allow injection of hostname and fix tests

40a84f6

Code formatting

88c2af9

Fix issues with query logic matching

67dc258

Merge branch 'integration' into task/queryLimit

23e49ca

Fix javadoc issues

33f9d1a

Improve query logic group filtering

de653e8

Ensure all query-specific nodes are ephemeral

4e632ed

Add listener for automatic heartbeat cache eviction

ceb3da0

Add methods to RunningQuery for stopping heartbeat

2f98616

Add logger for QueryHeartbeatCache

8e0d769

Merge branch 'integration' into task/queryLimit

a7fb51d

Move QueryHeartbeatCache into QueryLimiter

40299ec

Revert changes to docker-compose.yml

adb6d4a

Fix property name typo

2b9e6ac

Mark query as finished when there are no more results

ad5bc61

Add examples in QueryLimiterFactory.xml

7618494

Merge branch 'integration' into task/queryLimit

37c4469

Fix failing unit test

8b3307d

lbschanno marked this pull request as ready for review December 12, 2025 22:05

lbschanno added 2 commits January 6, 2026 04:46

Turn off trace logging for limit package

4b023ff

Merge branch 'integration' into task/queryLimit

d4fdc3a

foster33 reviewed Jan 6, 2026

View reviewed changes

...vices/cached-results/src/main/java/datawave/webservice/results/cached/CachedResultsBean.java Outdated Show resolved Hide resolved

web-services/query/src/main/java/datawave/webservice/query/limit/ActiveQueryTracker.java Outdated Show resolved Hide resolved

lbschanno and others added 3 commits January 6, 2026 11:29

Add space in log message for better formatting

3602cb7

Co-authored-by: foster33 <84727868+foster33@users.noreply.github.com>

Remove trailing slash from param description

7a73d94

Co-authored-by: foster33 <84727868+foster33@users.noreply.github.com>

Allow systems to be configured with no limit

23e239a

lbschanno requested review from billoley and foster33 January 13, 2026 17:19

Merge branch 'integration' into task/queryLimit

0e1ce47

ivakegg reviewed Jan 21, 2026

View reviewed changes

web-services/query/src/main/java/datawave/webservice/query/limit/ActiveQueryTracker.java Show resolved Hide resolved

Add README to with details of the query limit feature

b7e0a6c

lbschanno requested a review from ivakegg January 22, 2026 15:42

Merge branch 'integration' into task/queryLimit

860d0b2

ivakegg requested changes Jan 26, 2026

View reviewed changes

Add cache synchronization safeguard

eea4334

lbschanno requested a review from ivakegg January 27, 2026 00:03

lbschanno and others added 3 commits January 26, 2026 19:10

Fix code formatting

4e28bd4

Fix incomplete documentation

94e2f67

Merge branch 'integration' into task/queryLimit

b9bbe98

ivakegg and others added 4 commits March 3, 2026 12:30

Updated to use systemFrom parameter for system query limits

a8f1335

Merge branch 'integration' into task/queryLimit

4fbb302

Remove references to system hostnames

2f27966

Corrected constant

ba23625

ivakegg added the Integration Tested label Mar 4, 2026

ivakegg approved these changes Mar 4, 2026

View reviewed changes

ivakegg added the linked label Mar 4, 2026

apmoriarty approved these changes Mar 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to enforce concurrent query limits#3257

Add ability to enforce concurrent query limits#3257
lbschanno wants to merge 51 commits intointegrationfrom
task/queryLimit

lbschanno commented Oct 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ivakegg left a comment

Uh oh!

lbschanno commented Jan 26, 2026

Uh oh!

lbschanno commented Jan 27, 2026

Uh oh!

ivakegg commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

lbschanno commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Configuration

Implementation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ivakegg left a comment

Choose a reason for hiding this comment

Uh oh!

lbschanno commented Jan 26, 2026

Uh oh!

lbschanno commented Jan 27, 2026

Uh oh!

ivakegg commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lbschanno commented Oct 31, 2025 •

edited

Loading